[SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWideTable to use main method #22823

yucai · 2018-10-25T10:13:36Z

What changes were proposed in this pull request?

Refactor BenchmarkWideTable to use main method.
Generate benchmark result:

SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.WideTableBenchmark"

How was this patch tested?

manual tests

wangyum · 2018-10-25T10:45:40Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala

bin/spark-submit --class <this class> <spark sql test jar> -> bin/spark-submit --class <this class> --jars <spark core test jar> <spark sql test jar>

wangyum · 2018-10-25T12:59:01Z

...atalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala

It seems that this is not a good way, cc @dongjoon-hyun

+1 for @wangyum 's comment.
@yucai . In this case, we can not call this [TEST] or Refactor.

How about adding val splitThreshold = SQLConf.get.getConfString("spark.testing.codegen.splitThreshold", "1024").toInt to our To run this benchmark?

@wangyum Thanks for the suggestion! You prefer to modifying CodeGenerator.scala each time run this benchmark, right? I feel it could be kind of tricky to let user modify codes and if the CodeGenerator.scala changes in the future, it is hard to update the document here. @dongjoon-hyun @gengliangwang any suggestion?

In this case, we need advice from the right persons. :)

Personally I don't think this is a good solution. We should start a new discussion about whether to make it configurable in production as well.

wangyum · 2018-10-25T13:00:54Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala

"10", "100", "1024", "8196", "65536" -> "10", "100", "1024", "2048", "4096", "8196", "65536"?

wangyum · 2018-10-25T13:02:35Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala

BenchmarkWideTable -> WideTableBenchmark?

SparkQA · 2018-10-25T13:42:06Z

Test build #98013 has finished for PR 22823 at commit f8bce47.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yucai · 2018-10-25T14:20:26Z

Thanks @wangyum for good suggestion!

SparkQA · 2018-10-25T17:54:36Z

Test build #98018 has finished for PR 22823 at commit b00396f.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-10-25T18:22:24Z

Test build #98022 has finished for PR 22823 at commit d2da3a0.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-10-26T04:08:55Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/BenchmarkWideTable.scala

Hi, @davies and @cloud-fan and @kiszk .

This benchmark is added in Spark 2.1.0. This value 1k is determined by manually changing the split threhold.

This PR wants to add a configuration in CodeGenerator.scala for testing-purpose only.

Is the configuration helpful in general purpose?

If then, can we make another PR for that first?

If not, is it allowed to add this testing parameter?

I think we should have a PR to add this config officially. It should be useful for performance tuning.

Thank you for the decision, @cloud-fan !

@yucai . Please proceed to propose a new PR for only that new configuration (if you didn't start yet).

@dongjoon-hyun I am working on #22847.

SparkQA · 2018-10-29T13:08:48Z

Test build #98208 has finished for PR 22823 at commit 0cfac24.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

…nction configurable ## What changes were proposed in this pull request? As per the discussion in [apache#22823](https://github.com/apache/spark/pull/22823/files#r228400706), add a new configuration to make the split threshold for the code generated function configurable. When the generated Java function source code exceeds `spark.sql.codegen.methodSplitThreshold`, it will be split into multiple small functions. ## How was this patch tested? manual tests Closes apache#22847 from yucai/splitThreshold. Authored-by: yucai <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

dongjoon-hyun · 2018-11-05T18:55:01Z

Hi, @yucai .
Since #22847 is merged, could you rebase this to the master gain? Thanks!

yucai · 2018-11-06T02:27:04Z

@dongjoon-hyun Just push the rebased version, thanks!

dongjoon-hyun · 2018-11-06T05:17:10Z

Thank you, @yucai . Could you update the title because we are renaming it? Maybe, [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWideTable to use main method?

SparkQA · 2018-11-06T05:41:21Z

Test build #98501 has finished for PR 22823 at commit f714cc8.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-11-06T05:45:12Z

Could you review and merge yucai#7, @yucai ?

Update result

SparkQA · 2018-11-06T13:03:13Z

Test build #98509 has finished for PR 22823 at commit a755039.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

yucai · 2018-11-06T13:15:27Z

sql/core/benchmarks/WideTableBenchmark-results.txt

+split threshold 10                          38932 / 39307          0.0       37128.1       1.0X
+split threshold 100                         31991 / 32556          0.0       30508.8       1.2X
+split threshold 1024                        10993 / 11041          0.1       10483.5       3.5X
+split threshold 2048                          8959 / 8998          0.1        8543.8       4.3X


@dongjoon-hyun In my mac, at most case, 2048 is the best.

I tested it with openjdk (in one Linux VM), 2048 is also the best.

================================================================================================ projection on wide table ================================================================================================ OpenJDK 64-Bit Server VM 1.8.0_171-b10 on Linux 3.10.0-693.11.1.el7.x86_64 Intel Core Processor (Haswell) projection on wide table: Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ split threshold 10 23995 / 25673 0.0 22883.7 1.0X split threshold 100 12881 / 13419 0.1 12284.3 1.9X split threshold 1024 6435 / 7402 0.2 6137.2 3.7X split threshold 2048 5861 / 6766 0.2 5589.2 4.1X split threshold 4096 6464 / 7825 0.2 6164.6 3.7X split threshold 8192 7886 / 8742 0.1 7520.7 3.0X split threshold 65536 46143 / 48029 0.0 44005.6 0.5X

@yucai . I trust you. :) Don't worry about that. The scope of this PR is not for choosing the best option.

SparkQA · 2018-11-06T16:03:51Z

Test build #98519 has finished for PR 22823 at commit 610fc31.

This patch fails PySpark unit tests.
This patch merges cleanly.
This patch adds no public classes.

yucai · 2018-11-06T16:08:50Z

retest this please

dongjoon-hyun

@yucai , @cloud-fan , @gatorsmile , @gengliangwang .

This PR looks good to me. This refactoring is not for choosing the best default value for the configuration. It's beyond of the scope.

The result on different machines(JVM/CPU/DISK) are not the same. This PR clearly shows that the benefit of the previous @yucai 's configuration PR, #22847. Users are able to run some benchmarks (including this) with various configurations on their machine and workloads, and to choose their best values with that configuration.

SparkQA · 2018-11-06T19:21:23Z

Test build #98522 has finished for PR 22823 at commit 610fc31.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2018-11-06T23:40:26Z

Merged to master.

dongjoon-hyun · 2018-11-06T23:46:09Z

Thank you all.

dongjoon-hyun · 2018-11-08T05:41:11Z

sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/WideTableBenchmark.scala

+      Seq("10", "100", "1024", "2048", "4096", "8192", "65536").foreach { n =>
+        benchmark.addCase(s"split threshold $n", numIters = 5) { iter =>
+          withSQLConf(SQLConf.CODEGEN_METHOD_SPLIT_THRESHOLD.key -> n) {
+            df.selectExpr(columns: _*).foreach(identity(_))


Hi, All.
It turns out that this breaks Scala-2.12 build. I made a PR to fix that. #22970

I see, thanks!

…nction configurable ## What changes were proposed in this pull request? As per the discussion in [apache#22823](https://github.com/apache/spark/pull/22823/files#r228400706), add a new configuration to make the split threshold for the code generated function configurable. When the generated Java function source code exceeds `spark.sql.codegen.methodSplitThreshold`, it will be split into multiple small functions. ## How was this patch tested? manual tests Closes apache#22847 from yucai/splitThreshold. Authored-by: yucai <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>

…e main method ## What changes were proposed in this pull request? Refactor BenchmarkWideTable to use main method. Generate benchmark result: ``` SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain org.apache.spark.sql.execution.benchmark.WideTableBenchmark" ``` ## How was this patch tested? manual tests Closes apache#22823 from yucai/BenchmarkWideTable. Lead-authored-by: yucai <[email protected]> Co-authored-by: Yucai Yu <[email protected]> Co-authored-by: Dongjoon Hyun <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>

wangyum reviewed Oct 25, 2018

View reviewed changes

yucai force-pushed the BenchmarkWideTable branch from f8bce47 to b00396f Compare October 25, 2018 14:18

yucai changed the title ~~[SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to use main method~~ [SPARK-25676][SQL][TEST] Improve BenchmarkWideTable: use main method and automate Oct 26, 2018

yucai changed the title ~~[SPARK-25676][SQL][TEST] Improve BenchmarkWideTable: use main method and automate~~ [SPARK-25676][SQL][TEST] Improve BenchmarkWideTable: use main method and automation Oct 26, 2018

dongjoon-hyun reviewed Oct 26, 2018

View reviewed changes

yucai mentioned this pull request Nov 2, 2018

[SPARK-25850][SQL] Make the split threshold for the code generated function configurable #22847

Closed

yucai added 2 commits November 6, 2018 09:48

[SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to use main method

64e76e2

minor

83857d0

yucai changed the title ~~[SPARK-25676][SQL][TEST] Improve BenchmarkWideTable: use main method and automation~~ [SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to use main method Nov 6, 2018

yucai force-pushed the BenchmarkWideTable branch from 0cfac24 to 83857d0 Compare November 6, 2018 02:14

rebase

f714cc8

yucai changed the title ~~[SPARK-25676][SQL][TEST] Refactor BenchmarkWideTable to use main method~~ [SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWideTable to use main method Nov 6, 2018

Update result

edaa5a9

Merge pull request #7 from dongjoon-hyun/PR-22823

a755039

Update result

yucai commented Nov 6, 2018

View reviewed changes

8196 -> 8192

610fc31

dongjoon-hyun approved these changes Nov 6, 2018

View reviewed changes

asfgit closed this in 63ca4bb Nov 6, 2018

dongjoon-hyun reviewed Nov 8, 2018

View reviewed changes

[SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWideTable to use main method #22823

[SPARK-25676][SQL][TEST] Rename and refactor BenchmarkWideTable to use main method #22823

Uh oh!

Conversation

yucai commented Oct 25, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gengliangwang Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 25, 2018

Uh oh!

yucai commented Oct 25, 2018

Uh oh!

SparkQA commented Oct 25, 2018

Uh oh!

SparkQA commented Oct 25, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan Oct 26, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Oct 29, 2018

Uh oh!

dongjoon-hyun commented Nov 5, 2018

Uh oh!

yucai commented Nov 6, 2018

Uh oh!

dongjoon-hyun commented Nov 6, 2018

Uh oh!

SparkQA commented Nov 6, 2018

Uh oh!

dongjoon-hyun commented Nov 6, 2018

Uh oh!

SparkQA commented Nov 6, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 6, 2018

Uh oh!

yucai commented Nov 6, 2018

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

SparkQA commented Nov 6, 2018

Uh oh!

dongjoon-hyun commented Nov 6, 2018

Uh oh!

yucai commented Oct 25, 2018 •

edited

Loading

dongjoon-hyun Oct 26, 2018 •

edited

Loading

gengliangwang Oct 26, 2018 •

edited

Loading

cloud-fan Oct 26, 2018 •

edited

Loading